In Figure 21-1, observe that each line ends with a code, and there’s a legend at the bottom. Six of the

ten participants (#’s 1, 2, 4, 6, 9, and 10, labeled X) died during the course of the follow-up study.

Two participants (#5 and #7, labeled L) were LFU at some point during the study, and two participants

(#3 and #8, labeled E) were still alive at the end of the study. So this study has four participants — the

Ls and the Es — with censored survival times.

So, how do you analyze survival data containing censoring? The following sections explain the correct

ways to proceed as well as mistakes to avoid.

Analyzing censored data properly

Statisticians have developed techniques to utilize the partial information contained in censored

observations. We describe two of the most popular techniques later in this chapter, which are the

life-table method and the Kaplan-Meier (K-M) method. To understand these methods, you need to

first understand two fundamental concepts — hazard and survival:

The hazard rate is the probability of the participant dying in the next small interval of time,

assuming the participant is alive right now.

The survival rate is the probability of the participant living for a certain amount of time after

some starting time point.

The first task when analyzing survival data is usually to describe how the hazard and survival rates

vary with time. In this chapter, we show you how to estimate the hazard and survival rates, summarize

them as tables, and display them as graphs. Most of the larger statistical packages (such as those

described in Chapter 4) allow you to do the calculations we describe automatically, so you may never

have to do them manually. But without first understanding how these methods work, it’s almost

impossible to understand any other aspect of survival analysis, so we provide a demonstration for

instructional purposes.

Making mistakes with censored data

Here are two mistakes you need to avoid when working with survival data:

You shouldn’t exclude participants with a censored survival time from any survival analysis!

You shouldn’t substitute the censored date with some other value, which is called imputing. When

you impute numerical data to replace a missing value, it is common to use the last observed value

for that participant (called last observation carried forward, or LOCF, imputation). However, you

should not impute dates in survival analysis.

Exclusion and imputation don’t work to fix the missingness in censored data. You can see why in

Figure 21-2, where we’ve slid the timelines for all the participants over to the left as if they all had

their surgery on the same date. The time scale shows survival time in years after surgery instead of

chronological time.